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1. 


INTRODUCTION 


This  report  is  prepared  at  the  end  of  the  fith  year  of  a  6-year  project  (originally  a 
4-year  project  with  two  year  extension  to  25  October  2002).  Therefore,  the  sections  for 
1.1  Subject  and  Scope  of  the  Research,  1.2  Purpose,  1.3  Background  of  Previous  Work, 
and  2.1  Experimental  Methods,  Assumptions  and  Procedures  (pages  2-16)  are  the  same 
as  those  in  the  last  year’s  report.  In  the  subsequent  sections,  some  parts  in  the  last  year’s 
report  are  kept  the  same  for  continuation.  New  reports  are  written  in  italics. 

1.1.  The  Subject  and  Scope  of  the  Research 

Breast  cancer  is  a  leading  cause  of  death  in  women,  with  an  estimated  46,000 
deaths  per  year  in  the  United  States  (ref.  1).  Mammography  is  currently  the  only  known 
reliable  method  for  early  detection  of  breast  cancer  (refs.  2,3).  However,  early 
mammographic  signs  of  breast  cancer  such  as  clustered  microcalcifications  and  masses 
are  usually  very  subtle,  and  thus  10-30%  of  lesions  are  missed  even  by  trained 
radiologists.  These  misses  are  due  to  the  often  low  conspicuity  of  lesions,  eye  fatigue, 
and  human  error  (refs.  4-7).  However,  there  is  clear  evidence  (refs.  8,9)  that  radiologists' 
accuracy  in  the  detection  of  subtle  breast  lesions  would  be  improved  if  a  computer  output 
indicating  possible  sites  of  suspicious  lesions  were  made  available  to  radiologists  as  a 
"second  opinion." 

As  a  team  of  investigators  at  the  Kurt  Rossmann  Laboratories  for  Radiologic 
Image  Research  at  the  University  of  Chicago,  we  have  been  involved  since  1985  in 
developing  the  concepts  and  methodology  of  computer-aided  diagnosis  (CAD)  with 
which  to  assist  radiologists  in  detecting  lesions  and  improving  the  sensitivity  of  breast 
cancer  diagnosis  through  mammography  (refs.  10,  11).  CAD  may  be  defined  as  a 
diagnosis  made  by  a  radiologist  who  takes  into  account  the  results  of  automated  computer 
analyses  of  radiographic  images.  The  computer  output  may  be  used  as  a  "second 
opinion."  We  have  extensive  experience  in  developing  CAD  schemes.  In  addition  to 
breast  cancer,  we  have  developed  computer  schemes  for  the  detection  of  lung  nodules 
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(refs.  12,  13),  interstitial  infiltrates  (refs.  14, 15),  cardiomegaly  (refs.  16,  17)  and 
pneumothorax  (ref.  18)  in  chest  radiography;  the  detection  of  stenotic  lesions  and  blood 
flow  analysis  in  angiography  (refs.  19,  20);  and  the  assessment  of  osteoporosis  in  skeletal 
radiography  (ref.  21). 

In  mammography,  CAD  schemes  are  being  developed  for  detection  of  clustered 
microcalcifications  (refs.  8,  24-26,  28)  and  for  detection  of  masses  (refs.  22,  23).  A  basic 
scheme  for  automated  detection  of  clustered  microcalcifications  employs  a  difference 
image  technique  to  enhance  the  signal-to-noise  ratio  of  microcalcifications,  followed  by 
thresholding,  feature  extraction  and  classification  using  artificial  neural  networks.  At 
present,  the  performance  of  this  scheme  provides  a  sensitivity  of  approximately  85%  in 
the  detection  of  clustered  microcalcifications  with  a  false  positive  rate  of  approximately 
0.7  per  mammogram  when  it  is  tested  on  our  database  of  78  mammograms,  in  which  one 
half  are  normal  cases  and  the  other  half  includes  subtle  clustered  microcalcifications. 

For  the  automated  detection  of  mammographic  masses,  another  CAD  scheme  is 
being  developed  on  the  basis  of  a  bilateral  subtraction  technique  that  analyzes  deviations 
of  architectural  symmetry  between  the  right  and  left  breast  images,  with  asymmetries 
indicating  potential  masses  (refs.  29-32).  Currently,  this  scheme  performs  at 
approximately  90%  sensitivity  with  a  false  positive  rate  of  about  2  per  case  when  it  is 
tested  on  our  database  of  154  pairs  of  mammograms.  Our  current  research  effort  on  these 
CAD  schemes  is  focused  primarily  on  improving  further  their  performance  through 
careful  analysis  of  computer  false-positives  and  false-negatives. 

To  date,  these  studies  have  been  performed  retrospectively  on  selected  sets  of 
mammograms,  and  we  have  obtained  results  that  indicate  that  our  schemes  have  the 
potential  to  be  used  as  an  effective  aid  for  radiologists.  We  are  now  at  the  stage  in  the 
development  of  our  CAD  program  to  test  our  schemes  prospectively  on  a  large  number  of 
clinical  mammograms. 


On  November  8th,  1994,  we  implemented  an  “intelligent”  mammography 
workstation  (ref.  34)  and  began  the  first  test  of  our  schemes  on  clinical  screening 
mammograms  obtained  in  the  mammography  section  of  our  department.  This  workstation 
consists  of  an  IBM  Powerstation  590,  a  Konica  LD4500  laser  film  digitizer,  an  Alphatronix 
Inspire  40-GB  magneto-optical  jukebox,  two  Imlogix  1024  line  monitors,  and  a  Seikosha  VP 
4500  video  printer  for  hard  copy.  The  “intelligence”  of  the  workstation  comes  from  our 
automated  detection  schemes  for  clustered  microcalcifications  and  masses. 

In  order  to  realize  clinically  and  practically  mammographic  CAD  for  detection  of 
breast  cancers  in  screening  programs,  it  is  necessary  to  have  commercial  products  for 
widespread  use  by  radiologists  in  breast  clinics,  community  hospitals,  and  academic 
medical  centers.  Therefore,  in  1993,  ARCH  Development  Corporation  (ARCH),  which 
is  a  not-for-profit  organization  created  by  the  Board  of  Trustees  of  the  University  of 
Chicago  in  1986  as  a  unique  mechanism  to  commercialize  inventions  developed  by  the 
faculty  at  the  University  of  Chicago  and  by  scientists  at  Argonne  National  Laboratory, 
licensed  its  inventions  on  CAD  and  related  technologies  to  R2  Technology,  Inc. 

R2  Technology,  Inc.  was  founded  in  1993  with  the  specific  goal  of  developing 
and  marketing  a  computer  aided  diagnostic  system  in  mammographic  detection  of  breast 
cancer.  R2  Technology,  Inc.,  has  been  funded  by  leading  venture  firms  in  Silicon  Valley- 
-Sigma  Partners,  and  Burr,  Egan,  Deleage  and  Co.  --  that  have  supported  many  other 
successful  medical  and  computer  companies.  Its  development  group  has  over  250  man- 
years  of  experience  in  medical  imaging  and  computer  systems.  Its  product  development 
process  has  been  to  identify  those  high  potential  prototype  systems  in  leading  research 
institutions  and  form  alliances  and  integrate  those  systems  into  R2’s  core  technology.  As 
of  May  1995,  R2  has  alliances  with  the  University  of  Chicago,  Lockheed  Missiles  & 

Space  Company,  Inc.,  and  Sandia  National  Laboratories. 

Therefore,  the  next  logical  step  in  the  development  of  CAD  is  to  conduct  a  large- 
scale,  multi-institutional  demonstration  project  to  examine  whether  additional  breast 
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cancers  can  be  found  by  use  of  mammographic  CAD  workstations.  We  believe  that  the 
performance  of  mammographic  CAD  schemes  has  reached  the  high  level  necessary  for 
clinical  evaluation.  Serious  efforts  toward  commercialization  of  CAD  units  have  already 
begun.  Therefore,  it  is  likely  that  a  clear  positive  outcome  from  this  study  would  result  in 
production  of  commercial  products  for  widespread  use  in  breast  imaging  and  would  lead 
to  large-scale  clinical  trials. 

1.2.  Purpose 

The  goal  of  this  project  is  to  demonstrate  the  clinical  usefulness  of  computer- 
aided  diagnosis  (CAD)  in  mammographic  detection  of  breast  cancer  through  multi¬ 
disciplinary  and  multi-institutional  efforts.  We  plan  to  develop  clinical  prototype 
mammography  workstations  for  automated  detection  of  suspicious  lesions  in 
mammograms  by  incorporating  image  processing  techniques  and  artificial  neural 
networks.  The  prototype  workstation  will  be  used  as  a  “second  opinion”  to  assist 
radiologists’  interpretation  of  mammograms.  Clinical  usefulness  of  the  mammography 
CAD  will  be  demonstrated  and  evaluated  at  four  hospitals  in  the  Chicago  area.  The 
major  hypothesis  to  be  tested  in  this  proposal  is  that  CAD  improves  accuracy  in  the 
detection  of  breast  cancer  by  reducing  observational  errors  on  mammographic  images. 
Our  proposal  is  designed  to  demonstrate  that  approximately  23  additional  breast  cancers 
will  be  detected  among  approximately  45,000  screenees  due  to  the  use  of  CAD  computer 
output. 

The  specific  aims  of  this  demonstration  project  are  listed  below. 

(1)  Further  development  of  advanced  CAD  schemes  for  detection  of  breast  lesions 

(a)  Automated  detection  scheme  for  clustered  microcalcifications 

(b)  Automated  detection  scheme  for  masses 

(c)  Automated  scheme  for  characterization  of  detected  breast  lesions 
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(2)  Development  of  the  prototype  mammography  CAD  workstations  by  integration  of 
laser  digitizer,  high-speed  computer,  and  advanced  CAD  schemes 

(3)  Clinical  demonstration  and  evaluation  of  prototype  mammography  CAD 
workstations  at  two  hospitals:  one  academic  institution  and  one  community  hospital 

(4)  Analysis  of  outcomes  of  the  clinical  evaluation  of  the  prototype  workstations  for 
detection  of  additional  breast  cancers  by  the  use  of  computer  output 

1.3.  Background  of  Previous  Work 

We  have  been  working  on  the  development  of  computer-aided  diagnostic  (CAD) 
schemes  for  mammography,  chest  radiography,  angiography,  and  bone  radiography  since 
1985.  Therefore,  we  have  extensive  experience  in  quantitative  analysis  of  radiographic 
images  for  detection  and  characterization  of  various  patterns  based  on  computer-vision 
methods  and  artificial  neural  networks.  These  extensive  studies  provide  the  basis  for  the 
continued  development  and  testing  of  advanced  CAD  schemes  for  the  detection  of  breast 
lesions  proposed  in  this  research.  A  number  of  investigations  which  are  relevant  to  this 
study  are  described  briefly  here. 

(1)  Development  of  computerized  detection  scheme  for  mammographic 
microcalcifications 

We  have  investigated  the  application  of  computer-based  methods  to  the  detection 
of  microcalcifications  on  digital  mammograms.  Our  computer  detection  system  was 
based  on  a  difference-image  technique  in  which  a  signal-suppressed  image  was 
subtracted  from  a  signal-enhanced  image  to  remove  the  structured  background  in  a 
mammogram  (ref.  24).  Signal-extraction  techniques  adapted  to  the  known  physical 
characteristics  of  microcalcifications  were  then  used  to  isolate  microcalcifications  from 
the  remaining  noise  background  (ref.  25).  Signal-extraction  criteria  based  on  the  size, 
contrast,  number,  texture,  and  clustering  properties  of  microcalcifications  were  next 
imposed  on  the  detected  signals  to  distinguish  true  signals  from  noise  or  artifacts  (refs.  8, 


6 


25).  The  detection  accuracy  of  the  computer  scheme  was  evaluated  by  means  of  a  free- 
response  receiver  operating  characteristic  (FROC)  analysis.  In  a  study  of  78  clinical 
images  containing  subtle  microcalcifications,  the  automated  computer  scheme  obtained 
an  85%  true-positive  cluster  detection  rate  at  a  false-positive  detection  rate  of  1.5  clusters 
per  image.  These  results  indicated  that  the  automated  method  has  the  potential  to  aid 
radiologists  in  screening  mammograms  for  clustered  microcalcifications. 

We  have  applied  a  shift-invariant  neural  network  (SIANN)  to  eliminate  false¬ 
positive  detections  reported  by  the  CAD  scheme.  The  SIANN  is  a  layered  feed-forward 
neural  network  with  local,  spatially-invariant  interconnections  (refs.  27,  28).  The  basic 
idea  of  local,  spatially-invariant  interconnections  (or  sharing  local  interconnection 
weights)  was  first  introduced  by  Fukushima  in  his  Neocognitron  for  recognition  of 
handwriting  characters  in  the  early  1980s  (refs.  35,  36).  The  SIANN  developed  by 
Zhang  et  al.  (ref.  27)  for  image  processing  is  a  feed-forward  neural  network  without  the 
lateral  interconnections  and  feedback  loops  that  are  included  in  the  Neocognitron. 
Furthermore,  a  modified  error  backpropagation  (EBP)  algorithm  with  the  shift-invariant- 
connection  constraint  (ref.  27)  is  used  as  the  training  algorithm  in  the  SIANN.  The 
SIANN  has  been  shown  to  be  a  powerful  tool  for  pattern  recognition  and  image 
processing,  since  it  can  learn  to  discriminate  between  objects  on  the  basis  of  local 
features  with  results  that  are  invariant  to  translation  of  the  objects  (refs.  27,  28). 

This  neural  network  was  trained  to  detect  each  individual  microcalcification  in  a 
given  region  of  interest  (ROI)  reported  by  the  CAD  scheme.  A  ROI  was  classified  as  a 
positive  ROI  if  the  total  number  of  microcalcifications  detected  in  the  ROI  was  greater 
than  two.  The  performance  of  the  shift-invariant  neural  network  was  evaluated  by  means 
of  a  jack-knife  method  and  conventional  receiver  operating  characteristic  (ROC)  analysis 
by  using  a  database  of  168  ROIs  that  had  been  reported  by  the  CAD  scheme  when 
applied  to  39  mammograms.  The  analysis  yielded  an  average  area  under  the  ROC  curve 
(Az)  of  0.91.  Approximately  55%  of  false-positive  ROIs  were  eliminated  without  any 
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loss  of  true-positive  ROIs  (ref.  28).  This  result  was  considerably  better  than  that  obtained 
in  our  previous  study  using  a  conventional  three-layer,  feed-forward  neural  network. 

We  have  also  studied  radiologists'  detection  of  clustered  microcalcifications  on 
mammograms  to  determine  whether  CAD  can  improve  radiologists'  performance.  The 
results  of  a  ROC  study  showed  that  CAD,  using  the  level  of  computer  performance  at  that 
time  (sensitivity  =  87%,  4  false  clusters  per  image),  does  significantly  (pcO.OOl)  improve 
radiologists'  accuracy  in  detecting  clustered  microcalcifications  under  conditions  that 
simulate  the  rapid  interpretation  of  screening  mammograms  (ref.  8).  The  results  also 
suggested  that  a  reduction  in  the  computer's  false-positive  rate  would  further  improve 
radiologists'  diagnostic  accuracy. 

The  importance  of  our  findings  is  that  a  computerized  scheme  can  detect  clustered 
microcalcifications  in  digitized  mammograms  at  a  high  level  of  sensitivity  that  would  be 
comparable  to  levels  obtained  by  radiologists.  In  addition,  radiologists'  performance  in 
the  detection  of  clustered  microcalcifications  can  be  improved  significantly  when  the 
results  of  the  computer  output  are  provided  as  an  aid  to  the  radiologists. 

(2)  Development  of  computerized  detection  schemes  for  mammographic  masses 

A  computerized  scheme  has  been  developed  for  the  detection  of  masses  in  digital 
mammograms.  Based  on  deviations  from  the  normal  architectural  symmetry  of  the  right 
and  left  breasts,  a  bilateral  subtraction  technique  was  used  to  enhance  the  conspicuity  of 
possible  masses.  The  scheme  employed  pairs  of  conventional  screen-film  mammograms 
(right  and  left  MLO  views  and  right  and  left  CC  views),  which  were  digitized  by  a  TV 
camera/Gould  digitizer.  The  right  and  left  breast  images  in  each  pair  were  aligned 
manually  during  digitization.  A  nonlinear  bilateral  subtraction  technique,  which  involves 
linking  multiple  subtracted  images,  was  investigated  and  compared  to  a  simple  linear 
subtraction  method  (refs.  29,  30).  Various  feature-extraction  techniques  were  used  to 
reduce  false-positive  detections  resulting  from  the  bilateral  subtraction.  The  scheme  was 
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evaluated  using  46  pairs  of  clinical  mammograms  and  was  found  to  yield  a  95%  true¬ 
positive  rate  at  an  average  of  three  false-positive  detections  per  image.  This  preliminary 
study  indicated  that  the  scheme  is  potentially  useful  as  an  aid  to  radiologists  in  the 
interpretation  of  screening  mammograms. 

We  continued  to  investigate  the  characteristics  of  actual  masses  and  non-mass 
detections  in  order  to  develop  feature-analysis  techniques  with  which  to  reduce  the 
number  of  non-mass  (i.e.,  false-positive)  detections.  These  feature-analysis  techniques 
involved  extraction  of  various  features  such  as  area,  contrast,  circularity  and  border- 
distance  based  on  the  density  and  geometric  information  of  masses  in  both  processed  and 
original  breast  images.  Cumulative  histograms  of  both  actual-mass  detections  and  non¬ 
mass  detections  were  used  to  characterize  extracted  features  and  to  determine  the  cutoff 
values  used  in  the  feature  tests.  The  effectiveness  of  the  feature-analysis  techniques  was 
evaluated  using  FROC  analysis.  Results  showed  that  the  feature-analysis  techniques 
effectively  improved  the  performance  of  the  computerized  detection  scheme:  about  35% 
of  false-positive  detections  were  eliminated  without  loss  in  sensitivity  (ref.  31). 

We  have  developed  an  automated  technique  for  the  alignment  of  right  and  left 
breast  images  for  use  in  the  computerized  analysis  of  bilateral  breast  images.  In  this 
technique  (ref.  32),  the  breast  region  was  first  identified  by  use  of  histogram  analysis  and 
morphological  operations.  The  anterior  portions  of  the  tracked  breast  border  and 
computer-identified  nipple  positions  were  selected  as  landmarks  for  image  registration. 
The  paired  right  and  left  breast  images  were  then  registered  relative  to  each  other  by  use 
of  a  least-squares  matching  method.  Based  on  FROC  and  regression  analyses,  the 
detection  performance  obtained  with  the  automated  alignment  technique  was  found  to  be 
higher  than  that  obtained  with  simulated  misalignments.  These  results  indicated  that 
automatic  alignment  of  breast  images  is  feasible  and  that  mass-detection  performance 
appears  to  improve  with  the  inclusion  of  asymmetric  anatomic  information  and  is  not 
sensitive  to  slight  misalignment. 
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We  also  investigated  the  effect  of  case  selection  on  the  performance  of  a  CAD 
scheme,  since  the  choice  of  clinical  cases  used  to  test  the  scheme  can  affect  the  test 
results.  In  this  study,  we  deliberately  modified  the  components  of  our  database  to  study 
the  effects  of  this  modification  on  measured  performance.  Using  our  computerized 
scheme  for  the  automated  detection  of  breast  masses  from  mammograms,  we  found  that 
the  sensitivity  of  the  scheme  ranged  between  26%  to  100%  (at  a  false  positive  rate  of  1.0 
per  image),  depending  on  the  cases  used  to  test  the  scheme.  Even  a  20%  change  in  the 
cases  comprising  the  database  reduced  the  measured  sensitivity  by  15-25%  (ref.  33). 
Because  of  the  strong  dependence  of  measured  performance  on  the  testing  database,  it  is 
difficult  to  estimate  reliably  the  accuracy  of  a  CAD  scheme.  Moreover,  it  is  questionable 
to  compare  different  CAD  schemes  when  different  cases  are  used  for  testing.  Sharing 
databases,  creating  a  common  database,  or  using  a  quantitative  measure  to  characterize 
databases  are  possible  solutions  to  this  problem.  However,  none  of  these  solutions  exists 
or  is  practiced  at  present.  Therefore,  as  a  short-term  solution,  we  recommend  that  the 
method  used  for  selecting  cases  and  histograms  of  relevant  image  features  be  reported 
whenever  performance  data  are  presented. 

The  importance  of  our  findings  is  that  a  nonlinear  bilateral  subtraction  technique 
can  detect  mammographic  masses  at  a  high  level  of  sensitivity  that  are  again  comparable 
to  levels  obtained  by  radiologists. 

(3)  Computed  Detection  of  Lesions  Missed  by  Mammography 

Over  the  past  6  years,  we  have  been  collecting  cases  in  which  a  lesion  was  missed 
in  a  mammogram.  To  date,  69  cases  with  a  lesion  that  went  undetected  by  a  radiologists 
were  analyzed  by  the  two  detection  schemes  —  clustered  microcalcifications  and  masses 
(ref.  37).  In  all  cases  the  lesions  were  rated  retrospectively  as  being  subtle  to  extremely 
subtle  by  an  experienced  radiologist.  The  computer  schemes  correctly  identified 
approximately  50%  of  the  missed  lesions  —  54%  of  the  malignant  lesions  and  45%  of  the 
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benign  lesions.  The  false  positive  rate  was  1.3  per  image.  This  result  shows  that  our 
computer  detection  schemes  are  capable  of  identifying  cancers  that  are  overlooked  by 
radiologists. 

(4)  Classification  Schemes 

We  have  developed  a  method  for  differentiating  malignant  from  benign  clustered 
microcalcifications  in  which  image  features  are  both  extracted  and  analyzed  by  a 
computer.  One  hundred  mammograms  obtained  from  53  patients  who  had  biopsies  for 
suspicious  clustered  microcalcifications  were  used.  Our  technique  used  8  computer- 
extracted  features  of  clustered  microcalcifications  that  were  merged  by  an  artificial  neural 
network.  Features  were  based  on  the  size  and  shape  of  clusters  and  on  the  size,  shape, 
contrast,  and  uniformity  of  individual  microcalcifications  comprising  a  cluster.  Human 
input  was  limited  to  initial  identification  of  the  microcalcifications.  Our  method  correctly 
classified  100%  of  patients  with  breast  cancer  and  69%  of  patients  with  biopsy-proven 
benign  conditions.  ROC  analysis  showed  that  our  method  performed  significantly 
(p=0.03)  higher  than  5  radiologists  who  reviewed  the  mammograms  retrospectively.  This 
result  indicated  that  quantitative  features  extracted  by  a  computer  can  be  analyzed  by  a 
computer  to  distinguish  malignant  from  benign  clustered  microcalcifications,  and  that  our 
technique  can  potentially  help  radiologists  to  reduce  the  number  of  “false-positive” 
biopsies. 

2.  BODY 

2.1.  Experimental  Methods.  Assumptions  and  Procedures 

The  overall  plan  of  this  demonstration  project  involves  four  major  steps,  namely, 
(1)  further  development  of  advanced  CAD  schemes,  (2)  development  of  prototype 
mammography  CAD  workstations,  (3)  clinical  evaluation  of  prototype  workstations,  and 
(4)  analysis  of  outcomes  from  clinical  evaluations. 
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The  primary  goal  of  this  study  is  to  demonstrate  that  approximately  23  additional 
breast  cancers  will  be  detected  by  the  use  of  prototype  mammography  CAD  workstations 
for  approximately  45,000  screenees  who  are  expected  to  enter  a  three-year  clinical 
evaluation  at  two  hospitals.  The  potential  of  detecting  23  additional  breast  cancers  was 
estimated  from  an  average  breast  cancer  incidence  rate  of  five  per  1,000  screenees,  a 
current  average  miss  rate  of  20%,  and  a  level  of  CAD  performance  that  detects  50%  of 
currently  missed  cancer  lesions. 

Advanced  CAD  schemes  will  be  developed  for  detection  of  clustered 
microcalcifications  and  masses  as  well  as  characterization  of  detected  lesions  by 
integrating  a  number  of  new  methods  into  the  existing  programs  and  optimizing  a  number 
of  parameters  for  achieving  high  performance  levels  above  the  current  ones.  Two  kinds 
of  prototype  mammography  CAD  workstations  will  be  developed.  The  first  prototype 
unit  is  based  on  the  existing  intelligent  workstation  at  the  University  of  Chicago,  which 
will  incorporate  the  most  advanced  CAD  software  and  will  be  used  for  clinical  evaluation 
on  approximately  30  screenees  per  day  at  the  University  of  Chicago.  The  second  type  is 
the  prototype  commercial  units  which  will  be  developed  by  R2  Technology,  Inc.,  and  will 
be  used  for  clinical  evaluation  on  approximately  30  screenees  per  day  at  LaGrange 
Memorial  Hospital. 

The  impact  of  the  computer  output  from  the  prototype  workstation  will  be 
evaluated  by  examining  if  and  when  the  radiologist  changes  his/her  initial  diagnosis.  The 
computer  output  will  be  presented  to  the  radiologist  only  after  the  radiologist  has  entered 
his/her  initial  findings  into  the  computer  as  to  the  normal  and  abnormal  lesion(s).  A 
particularly  important  datum  in  this  demonstration  project  is  the  measurement  of  the 
number  of  breast  cancer  cases  on  which  the  radiologist  did  not  initially  indicate  the  breast 
cancer  lesion  but  did  make  a  final  correct  diagnosis  by  using  the  computer  output  as  a 
“second  opinion.” 


12 


In  this  demonstration  project,  we  will  not  direct  effort  toward  the  development  of 
major  new  methods  and  techniques  on  mammographic  CAD  schemes.  Instead,  we  plan 
to  incorporate  several  useful  methods  and  techniques,  which  are  recently  developed,  into 
the  CAD  software  package  for  implementation  in  the  prototype  intelligent  mammography 
workstation.  It  is  important  to  note  that  considerable  research  effort  would  be  required  to 
optimize  many  parameters  associated  with  new  CAD  methods  and  the  existing  CAD 
algorithms  in  order  to  integrate  all  of  the  components  into  a  single  package  that  functions 
successfully. 

In  the  first  phase  of  this  project,  we  plan  to  develop  advanced  CAD  schemes  for 
detection  of  clustered  microcalcifications  and  masses,  and  then  to  incorporate  them  into 
the  prototype  intelligent  mammography  workstation  for  clinical  evaluation  at  the 
University  of  Chicago.  However,  as  the  performance  of  advanced  CAD  schemes  in  our 
laboratory  improves  through  continued  efforts  on  the  optimization  process,  the  CAD 
software  package  in  the  workstation  will  be  upgraded  as  needed.  In  the  second  phase  of 
this  project,  we  plan  to  incorporate  additional  CAD  schemes  to  characterize  detected 
lesions  as  benign  or  malignant. 

(1)  Automated  scheme  for  detection  of  clustered  microcalcifications 

We  plan  to  investigate  and  incorporate  three  new  approaches  to  improve  the 
performance  of  automated  detection  of  clustered  microcalcifications.  They  are  (1)  local 
edge-gradient  analysis  techniques  for  reduction  of  false-positives,  (2)  shift-invariant 
neural  networks  for  removal  of  false-positives,  and  (3)  wavelet  transform  techniques  for 
improvement  in  the  sensitivity  in  detecting  clustered  microcalcifications,  as  described 
below.  Many  parameters  associated  with  these  approaches  will  be  selected  carefully  to 
optimize  the  overall  performance  in  detecting  clustered 

microcalcifications.  It  is  important  to  note  that  previous  studies  on  these  methods  were 
based  on  mammograms  digitized  using  a  drum  scanner.  In  this  project,  we  plan  to 
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determine  all  of  the  new  parameters  with  mammograms  digitized  using  a  laser  scanner 
that  is  integrated  into  the  prototype  intelligent  mammography  workstation. 

(2)  Automated  scheme  for  detection  of  masses 

We  plan  to  investigate  and  incorporate  three  new  approaches  to  improve  the 
performance  of  automated  detection  of  mass  lesions:  (1)  Hough  spectrum  analysis  for 
the  detection  of  spiculated  lesions  and  architectural  distortions;  (2)  gradient  and 
circularity  analysis  for  the  detection  of  very  small  early  cancers;  and  (3)  artificial  neural 
networks  for  the  merging  of  various  features  of  suspect  lesions,  identified  either  by  the 
bilateral  subtraction  method  or  by  the  two  new  single  image  methods,  in  order  to  reduce 
the  number  of  false-positive  detections. 

(3)  Automated  scheme  for  characterization  of  detected  lesions 

In  the  second  phase  of  development  of  advanced  CAD  schemes,  we  plan  to 
investigate  and  incorporate  two  automated  schemes  for  distinguishing  between  benign 
and  malignant  lesions  both  for  detected  clustered  microcalcifications  and  masses.  The 
likelihood  of  malignancy  on  each  detected  suspicious  lesions  will  be  calculated  from  our 
schemes  and  will  be  displayed  together  with  the  location(s)  of  detected  lesion(s)  on  the 
prototype  mammography  CAD  workstation  at  the  University  of  Chicago.  We  plan  to 
investigate  whether  the  calculated  likelihood  of  malignancy  added  to  the  CAD  computer 
output  may  improve  the  diagnosis  of  breast  cancer  by  reducing  the  false-positives  and 
false-negatives. 

(4)  Development  of  prototype  mammography  CAD  workstations 

We  plan  to  develop  two  kinds  of  prototype  mammography  CAD  workstations  for 
clinical  evaluation.  One  is  based  on  the  existing  intelligent  mammography  workstation  at 
the  University  of  Chicago,  which  will  be  modified  by  incorporating  advanced  CAD 
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software  and  by  improving  some  aspects  of  the  hardware  configuration.  This  first 
prototype  system  will  be  used  for  clinical  evaluation  at  the  University  of  Chicago.  The 
second  type  of  prototype  system  will  be  developed  by  R2  Technology,  Inc.,  as  a  potential 
commercial  unit,  and  will  be  placed  for  clinical  evaluation  at  LaGrange  Memorial 
Hospital.  Although  the  basic  principles  employed  in  the  two  kinds  of  prototype 
workstations  are  similar  due  to  licensing  of  the  University  of  Chicago  technologies  to  R2 
Technology,  Inc.,  these  two  systems  are  not  identical.  Therefore,  we  plan  to  investigate 
the  levels  of  performance  of  each  prototype  workstation. 

(5)  Clinical  evaluation  of  prototype  mammography  CAD  workstations 

Multi-institutional  clinical  evaluation  of  mammography  CAD  workstations  will 
be  carried  out  at  two  clinical  sites:  the  Mammography  Section  of  the  Department  of 
Radiology,  the  University  of  Chicago  and  LaGrange  Memorial  Hospital  in  LaGrange, 
Illinois.  The  number  of  screenees  per  day  who  will  enter  this  clinical  evaluation  at  each 
of  the  two  hospitals  is  approximately  30.  The  total  number  of  screenees  per  day  will  be 
60.  We  have  already  obtained  an  approval  from  the  Institutional  Review  Board  (IRB)  for 
clinical  evaluation  of  the  prototype  intelligent  mammography  workstation  at  the 
University  of  Chicago  and  LaGrange  Memorial  Hospital. 

To  examine  the  impact  of  mammographic  CAD  on  clinical  outcomes,  we  plan  to 
obtain  data  from  mammography  audits  without  and  with  the  prototype  CAD 
workstations.  For  the  first  six  months  of  this  project,  the  CAD  workstation  will  not  be 
used  and  we  will  collect  results  of  mammography  audits.  For  the  next  year,  the  first 
clinical  evaluation  with  the  CAD  workstation  will  be  carried  out.  Then,  a  second 
mammography  audit  will  be  conducted  for  the  subsequent  six-months  period  without  the 
CAD  workstation.  We  believe  that  this  second  segment  will  be  useful  to  obtain 
additional  baseline  data  and  also  to  examine  the  potential  variation  in  the  baseline  data 
without  the  CAD  workstation  being  used  clinically.  For  the  final  two-year  period,  the 
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second  clinical  evaluation  of  the  CAD  workstations  will  be  carried  out.  We  will  audit  the 
total  of  three-year  periods  when  the  CAD  workstations  were  used  and  compare  those 
results  to  the  audit  of  the  two  six-month  audits.  This  will  allow  us  to  study  the  effects  of 
CAD  by  comparing  parameters  such  as  sensitivity,  call-back  rates,  positive  predictive 
value,  etc. 

For  daily  clinical  evaluation  of  the  CAD  workstations,  all  screening  mammograms 
will  be  digitized  by  a  research  technologist  at  each  of  the  two  sites  and  the  computed 
results  from  the  CAD  schemes  will  be  stored.  When  the  radiologist  reads  the  original  film 
mammogram,  he/she  enters  his/her  findings  on  normal  or  abnormal  lesion(s)  into  the  CAD 
workstation  using  a  light  pen  and  soft  copy  of  the  mammograms  on  CRT  monitors.  Then, 
the  computer  output  will  be  indicated  on  the  monitor.  The  radiologist  will  then  have  an 
opportunity  to  modify  his/her  opinion  using  the  light  pen.  If  the  radiologist  changes 
his/her  initial  diagnosis  due  to  the  computer  output,  then  the  radiologist  will  enter  the  final 
result  into  the  computer.  With  this  procedure,  we  will  be  able  to  determine  the  number  of 
breast  cancer  cases  on  which  the  radiologist  may  miss  the  lesion  initially  but  may  correct 
his/her  findings  using  the  CAD  output. 

(6)  Analysis  of  outcomes  from  clinical  evaluation  of  prototype  mammography  CAD 

workstations 

The  effect  of  mammography  CAD  workstations  on  clinical  outcomes  in  the 
detection  of  breast  cancer  will  be  analyzed  both  prospectively  on  a  daily  basis  using  the 
workstation  and  on  a  semi-annual  basis  using  the  results  of  mammography  audits. 
Radiologists’  performance  will  be  evaluated  as  a  group  and  also  as  individuals  in  order  to 
examine  the  inter-  and  intra-observer  variability.  Since  each  of  the  two  clinical  sites  has 
already  established  its  own  mammography  audit  system,  data  for  “truth”  in  terms  of 
normal/abnormal  (breast  cancer)  cases  will  be  obtained  from  each  site’s  mammography 
audit  system  for  analysis  of  outcomes  in  this  demonstration  project. 
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2.2  Results  and  Discussion 

(1)  Development  of  automated  detection  scheme  for  clustered  microcalcifications 

We  have  been  developing  techniques  for  optimizing  our  rule-based  scheme. 
Previously,  we  investigated  the  use  of  a  genetic  algorithm  for  selecting  the  optimum  set 
of  thresholds  for  our  detection  scheme.  The  genetic  algorithm  used  a  cost  function  that 
combined  the  false-positive  rate  and  the  true-positive  rate  to  produce  a  single  value.  This 
required  us  to  arbitrarily  assign  weightings  to  true-positive  versus  false-positive  rates, 
which  is  a  very  difficult  task.  Using  this  approach,  a  single  set  of  thresholds 
corresponding  to  one  set  of  true-positive  and  false-positive  rates  (a  single  operating  point 
on  an  FROC  curve)  was  obtained.  However,  if  the  weightings  assigned  to  the  true-  and 
false-  positive  rates  were  not  the  best  choice,  then  the  solution  of  the  genetic  algorithm 
would  not  be  optimal.  Because  only  a  single  operating  point  was  obtained,  it  is  difficult 
to  assess  the  appropriateness  of  this  solution.  We  are  now  investigating  the  use  of  a 
multi-objective  genetic  algorithm,  which  produces  an  optimum  FROC  curve,  not  just  a 
single  operating  point.  Using  the  previous  genetic  algorithm  approach  an  optimum 
solution  corresponding  to  87%  sensitivity  at  1.0  false  positives  per  image  was  obtained. 
Using  the  multi-objective  approach,  the  same  operating  point  was  obtained,  in  addition  to 
another  of  other  points.  For  a  lower  false-positive  rate,  say  0.2  per  image,  a  sensitivity  of 
83%  can  be  obtained.  A  higher  sensitivity,  say  95%,  can  be  obtained  at  2  false  positives 
per  image.  We  believe  that  the  multi-objective  genetic  algorithm  is  the  best  approach  for 
optimizing  our  scheme. 

We  have  investigated  the  use  of  a  multi-objective  genetic  algorithm  (MOGA)  to 
optimize  our  rule-based  scheme.  The  MOGA  is  a  search  method  to  find  the  optimal  set 
of  sensitivity  and  specificity  pairs  by  efficiently  searching  through  the  set  of  all  possible 
solutions.  The  initial  study  of  the  MOGA  was  done  on  our  standard  dataset  for 
development  of  techniques.  We  are  now  in  the  process  of  optimizing  our  detection 
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scheme  using  the  MOGA  on  a  set  of  cases  from  our  clinical  database,  augmented  with  an 
additional  50  cancer  cases  selected  from  our  film  library. 

(2)  Development  of  automated  detection  scheme  for  masses 

We  have  incorporated  three  techniques  to  improve  the  overall  performance  of  our 
CAD  schemes  for  detection  of  masses.  Three  techniques  include  Hough  spectrum 
analysis,  gradient  and  circularity  analysis,  and  artificial  neural  networks.  We  attempted 
to  achieve  the  high  overall  performance  by  optimal  selection  of  many  parameters 
involved  in  this  scheme  and  also  to  examine  various  classifiers  to  distinguish  between 
lesions  and  false  positives.  In  our  CAD  scheme,  many  features  are  extracted  from 
potential  lesion  sites  and  merged  into  a  single  decision  variable  using  a  classifier. 
Numerous  features  can  be  extracted  from  potential  lesion  sites  making  it  difficult  to 
optimally  choose  representative  features  to  be  used  as  inputs  to  a  classifier.  We  have 
undertaken  the  problem  of  feature  selection  for  two  different  classifiers  using  a  dataset 
consisting  of  features  extracted  from  lesions  and  false-positive  detections.  We  have 
applied  traditional  feature  selection  techniques  such  as  single  feature  selectors  and 
stepwise  selectors.  In  addition,  we  have  applied  genetic  algorithms  to  this  search  task.  A 
genetic  algorithm  is  an  optimization  technique  loosely  based  on  natural  selection. 
Multiple  solutions  to  a  problem  are  randomly  generated  and  their  “fitness”  is  evaluated. 
Solutions  with  better  fitness  values  are  more  likely  to  survive  to  subsequent  generations, 
while  solutions  with  a  poor  fitness  value  will  “die  out.”  This  “survival  of  the  fittest” 
strategy  usually  results  in  a  rapid  convergence  to  the  optimal  solution.  By  employing 
genetic  algorithms,  we  have  improved  the  Az  of  our  mass  CAD  scheme  from  0.96  to  0.98 
using  artificial  neural  networks.  With  linear  discriminants,  the  Az  improved  from  0.93  to 
0.95.  The  results  from  the  linear  discriminant  analysis  show  that  the  genetic  algorithm 
feature  selection  method  is  as  good  as,  if  not  better  than  the  stepwise  method.  Similar 
results  were  obtained  for  the  artificial  neural  network  classifiers  but  the  results  were  not 
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as  strong.  As  with  all  studies  employing  neural  networks,  it  is  possible  that  there  is  over¬ 
fitting  of  the  data.  We  attempted  to  minimize  this  effect  by  simplifying  the  structure  of 
our  networks  and  by  employing  cross-validation  or  leave-one-out  tests. 

A  new  development,  which  is  now  being  implemented  into  the  mass  detection 
scheme,  is  a  new  region  growing  algorithm.  We  have  developed  two  novel  lesion 
segmentation  techniques  —  one  based  on  a  single  feature  called  the  radial  gradient  index 
(similar  feature  to  that  described  above)  and  one  based  on  a  simple  probabilistic  model  to 
segment  mass  lesions  from  surrounding  background.  In  both  methods  a  series  of  image 
partitions  is  created  using  gray-level  information  as  well  as  prior  knowledge  of  the  shape 
of  typical  mass  lesions.  With  the  former  method  the  partition  that  maximizes  the  radial 
gradient  index  is  selected.  In  the  latter  method,  probability  distributions  for  gray-levels 
inside  and  outside  the  partitions  are  estimated,  and  subsequently  used  to  determine  the 
probability  that  the  image  occurred  for  each  given  partition.  The  partition  that  maximizes 
this  probability  is  selected  as  the  final  lesion  partition  (contour).  We  tested  these 
methods  against  our  previous  region-growing  algorithm  using  a  database  of  biopsy- 
proven,  malignant  lesions  and  found  that  the  new  lesion  segmentation  algorithms  more 
closely  match  radiologists'  outlines  of  these  lesions.  At  an  overlap  threshold  of  0.30,  gray 
level  region  growing  correctly  delineates  62%  of  the  lesions  in  our  database  while  the 
radial  gradient  index  algorithm  and  the  probabilistic  segmentation  algorithm  correctly 
segment  92%  and  96%  of  the  lesions,  respectively.  With  these  new  segmentation  results 
we  hope  to  find  and  extract  new  features  that  will  help  differential  between  actual  lesions 
and  false-positive  detections,  thus  improving  the  overall  performance  of  computerized 
mass  detection. 

Our  computerized  detection  method  for  masses  initially  identifies  various  suspect 
lesion  sites.  Features  from  these  sites  are  then  extracted  automatically  and  merged  by  a 
classifier  in  order  to  reduce  the  number  of  false-positive  detections.  Different  subsets  of 
features  will,  in  general,  result  in  different  classification  performances.  We  investigated 
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the  effect  of  having  a  limited  datasets  on  feature  selection.  We  showed  that,  with  limited 
datasets  and/or  a  large  number  of  features  from  which  to  choose,  bias  is  introduced  if  the 
classifier  parameters  are  determined  using  the  same  data  that  were  employed  to  select  the 
"optimal"  set  of  features. 

We  have  investigated  the  use  of  a  Bayesian  neural  network  in  the  merging  of 
computer-extracted  features  of  actual  lesions  and  false-positive  detections.  We  found 
that  with  a  limited  dataset,  use  of  the  Bayesian  network  reduces  the  potential  for 
overtraining  typically  encountered  in  conventional  neural  networks. 

We  have  previously  investigated  the  use  of  a  radial  gradient  index  (RGI)  to  aid  in 
the  segmentation  of  mass  lesions  from  parenchymal  background  in  digitized 
mammograms.  In  this  work,  we  develop  a  non-linear  filtering  algorithm  based  on  the 
RGI  that  creates  RGI  feature  images  from  digital  mammograms,  which  can  be 
subsequently  thresholded  to  distinguish  between  mass  lesions  and  normal  regions.  This 
initial  stage  of  mass  detection  is  focussed  on  improving  sensitivity  leaving  later  feature 
analysis  and  classification  stages  for  reducing  false-positive  detections.  Using  just  RGI 
filtering,  we  achieved  a  sensitivity  of  93%  with  16  false  detections  per  image  on  a 
database  of  60  patients  (112  images).  After  feature  analysis  and  classification  on  the 
suspect  regions,  the  by-patient  sensitivity  of  77%  at  2  false  positives  per  image  was 
obtained. 

(3)  Development  of  automated  scheme  for  characterization  of  clustered  microcalcifications 

We  have  developed  an  automated  scheme  for  the  classification  of  clustered 
microcalcifications  as  malignant  or  benign.  We  have  shown  in  previous  studies  that  this 
computer  scheme  to  be  more  accurate  than  radiologists  in  differentiating  between 
malignant  and  benign  microcalcifications.  We  have  also  shown  in  an  observer  study  that 
this  computer  aid  can  help  radiologists  improve  their  diagnostic  accuracy  and  improve 
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their  biopsy  recommendations.  In  this  observer  study,  ten  radiologists  read  the 
mammograms  from  104  patients  with  and  without  our  computer  aid,  and  they  reported 
their  confidence  that  a  microcalcification  cluster  represented  a  malignancy  and  also 
reported  their  recommendation  of  biopsy  or  follow-up.  We  performed  two  additional 
analyses  of  the  observer  study  data  to  investigate  the  effects  of  the  computer 
classification  scheme  on  radiologists'  diagnostic  performance. 

In  one  analysis,  we  compared  the  variability  with  and  without  our  computer  aid  in 
the  radiologists'  interpretation  of  malignant  and  benign  microcalcifications  and  in  their 
recommendations  for  biopsy  or  follow-up.  First,  when  the  computer  aid  was  used, 
variation  in  the  radiologists’  diagnostic  accuracy  as  measured  by  the  standard  deviation 
of  the  area  under  the  ROC  curves  (Az)  was  reduced  47%.  This  reduction  in  variability  is 
in  addition  to  a  statistically  significant  gain  in  diagnostic  accuracy,  as  measured  by  (1)  an 
increase  of  the  average  of  Az  from  0.61  to  0.75  (p<0.0001),  (2)  an  increase  of  6.4 
biopsies  per  radiologist  on  cancer  cases  (p=0.0006),  and  (3)  a  decrease  of  6.0  biopsies  per 
radiologist  on  benign  lesions  (p=0.003).  Second,  use  of  the  computer  aid  increased  the 
agreement  by  all  ten  observers  from  13%  to  32%  of  total  cases  (p  =  0.0002).  The  kappa 
statistic  which  is  a  quantitative  measure  of  agreement,  increased  from  0.19  to  0.41 
(p<0.05).  Finally,  use  of  the  computer  aid  eliminated  two  thirds  of  substantial 
disagreements  where  biopsy  and  routine  screening  were  recommended  for  the  same 
patient  by  two  radiologists  (p<0.05).  We  conclude  that  CAD  holds  the  potential  to 
reduce  the  variability  in  radiologists’  interpretation  of  mammograms  in  addition  to  its 
potential  to  improve  diagnostic  accuracy. 

In  the  second  analysis,  we  reviewed  those  cases  that  the  radiologists' 
recommendation  of  biopsy  or  follow-up  was  altered  by  the  computer  aid.  These 
consisted  of  31%  of  the  total  cases.  Radiologists  were  more  likely  to  recommend 
additional  biopsies  when  the  computer  estimated  high  values  of  likelihood  of  malignancy, 
and  they  were  more  likely  to  drop  biopsy  recommendations  when  the  computer  estimated 
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low  values  of  likelihood  of  malignancy.  The  overall  probability  of  recommending  an 
additional  biopsy  was  similar  to  the  overall  probability  of  dropping  a  biopsy 
recommendation  (15%  versus  16%).  The  probability  of  recommending  an  additional 
biopsy  with  a  high  computer-estimated  likelihood  of  malignancy  was  similar  for 
malignant  cases  and  for  benign  cases  (26%  versus  28%).  However,  the  probability  of 
dropping  a  biopsy  recommendation  with  a  low  computer-estimated  likelihood  of 
malignancy  was  much  higher  for  benign  cases  than  for  malignant  cases  (39%  versus 
22%).  We  conclude  that  CAD  can  be  used  to  improve  radiologists'  ability  to  differentiate 
between  malignant  and  benign  clustered  microcalcifications  and  to  improve  radiologists' 
biopsy  recommendations. 

We  have  developed  an  automated  scheme  for  the  classification  of  clustered 
microcalcifications  as  malignant  or  benign.  We  have  shown  in  previous  studies  that  this 
computer  scheme  to  be  more  accurate  than  radiologists  in  differentiating  between 
malignant  and  benign  microcalcifications.  We  have  also  shown  in  an  observer  study  that 
this  computer  aid  can  help  radiologists  improve  their  diagnostic  accuracy  and  improve 
their  biopsy  recommendations.  In  this  observer  study,  ten  radiologists  read  the 
mammograms  from  104  patients  with  and  without  our  computer  aid,  and  they  reported 
their  confidence  that  a  microcalcification  cluster  represented  a  malignancy  as  well  as 
their  recommendation  of  biopsy  or  follow-up. 

To  compare  computer-aided  diagnosis  (CAD)  with  double  readings  by 
radiologists,  we  conducted  a  comparative  study  using  data  from  the  observer  study.  We 
derived  radiologists'  double-reading  performance  post  hoc  from  their  independent  and 
unaided  single  reading  data  using  five  different  objective  rules  of  independent  double 
readings  and  another  rule  of  simulated-optimal  double  reading  that  assumed  that 
consultations  for  resolving  two  radiologists'  different  independent  diagnoses  always 
produce  the  correct  clinical  recommendation.  From  these  results  and  the  unaided  single 
reading  and  CAD  reading  data,  we  calculated  sensitivity  and  specificity  from  the 


22 


observers'  biopsy  recommendations  and  obtained  ROC  curves  from  their  diagnostic 
confidence  ratings. 

We  found  that  the  unaided  single  reading  yielded  74%  sensitivity  and  32% 
specificity;  whereas  the  CAD  reading  had  87%  sensitivity,  42%  specificity,  and  appeared 
on  a  higher  ROC  curve  than  the  unaided  single  reading  (p  <  0.0001).  Five  methods  of 
formulating  independent  double  readings  generated  sensitivities  between  59%  and  89% 
and  specificities  between  50%  and  13%,  with  their  resulting  operating  points  appearing 
essentially  along  the  unaided  single-reading  ROC  curve.  The  result  of  the  simulated- 
optimal  double  reading,  however,  was  similar  to  that  of  CAD  reading,  with  89% 
sensitivity  and  50%  specificity. 

We  conclude  from  this  study  that  no  real-world  combinations  of  double  reading 
improves  diagnostic  performance  except  for  CAD  reading,  which  approaches  the 
simulated  optimal  performance. 

We  have  deloped  an  automated  computer  technique  that  classifies  clustered 
microcalcifications  in  mammograms  as  malignant  or  benign.  We  have  shown  previously 
in  two  observer  studies  that  this  computer  technique  can  both  be  more  accurate  than 
radiologists  and  help  radiologists  to  be  more  accurate  in  differentiating  benign  from 
malignant  clustered  microcalcifications.  This  computer  classification  technique  was 
developed  on  digitized  screen-film  mammograms.  In  an  effort  to  develop  this  technique 
for  full-field  digital  mammograms,  we  conducted  a  study  of  this  technique  on  small-field 
digital  mammograms  obtained  during  stereotactic  biopsy  procedures.  The  goal  of  this 
work  was  not  to  analyze  these  small-field  mammogrms  per  se,  but  to  analyze  there 
mammogrtams  that  were  obtained  with  a  digital  detector.  The  rationale  is  that  we  expect 
the  findings  from  this  analysis  of  the  small-field  digital  mammograms  to  apply,  in 
principle,  to  full-field  digital  mammograms  as  well.  In  this  study,  we  analyzed  79  lesions, 
of  which  33  were  malignant  and  6  were  benign.  Each  of  these  cases  typically  consisted 
of  more  than  one  image  and,  therefore,  we  analyzed  a  total  of  176  images,  of  which  56 
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were  of  the  malignant  lesions  and  120  were  of  the  benign  lesions.  We  applied  the  same 
computer  technique  developed  based  on  digitized  screen-film  mammograms  using  the 
same  computer-extracted  image  features.  The  computer  technique  achieved  an  Az  value 
of  0.84  for  the  176  images  and  0.90  for  the  79  lesions.  In  comparison,  radiologists  who 
evaluated  these  lesions  prior  to  biopsy  achieved  an  A.  value  ofO.  76  for  the  79  lesions. 
Therefore,  our  computer  technique  outperformed  the  radiologists  in  classifying  these 
breast  lesions  as  malignant  or  benign.  We  concluded  from  this  study  that  our  computer 
technique  can  potentially  classify  clustered  microcalcifications  accurately  as  malignant 
or  benign  in  mammograms  acquired  with  digital  detectors 

(4)  Development  of  automated  scheme  for  characterization  of  masses 

The  automated  classification  of  masses  begins  by  segmenting  the  lesions  using  a 
grey-level  region  growing  applied  to  a  512x512  ROI  (region  of  interest)  centered  on  the 
lesion  after  background-trend  correction  (using  a  second  order  polynomial)  and 
histogram  equalization.  The  grey-level  threshold  value  is  determined  from  a  “transition 
point.”  The  transition  point  is  the  grey  level  for  which  there  is  a  discontinuous  decrease 
in  the  circularity  and  a  corresponding  discontinuous  increase  in  size  of  the  grown  lesion 
(ref.  39). 

From  the  segmented  lesion,  four  features  related  to  the  degree  of  spiculation, 
margin  sharpness,  density  of  each  mass,  and  the  texture  within  the  mass  are  extracted 
automatically  from  the  neighborhoods  of  mass  regions.  The  techniques  for  extracting 
these  four  features  are  described  in  ref.(39).  Because  of  its  strong  ability  to  differentiate 
benign  from  malignant  masses,  degree  of  spiculation  is  first  used  in  a  rule-based 
technique  (i.e.,  a  threshold  is  applied  to  the  degree  of  spiculation  measure).  Those 
masses  that  have  a  spiculation  measure  lower  than  a  threshold  value  are  then  subjected  to 
the  ANN,  where  the  remaining  features  are  used  as  input.  The  architecture  of  the  ANN  is 
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three  input  units,  two  hidden  units,  and  one  output  unit.  The  spiculation  measure  and  the 
output  of  the  ANN  are  used  to  determine  the  likelihood  of  malignancy. 

Using  a  database  of  95  mammograms  containing  masses  from  65  patients  (all  but 
one  having  been  biopsied  for  the  suspicion  of  breast  cancer),  the  performance  of  the  mass 
classification  technique  was  measured  and  compared  to  the  results  of  interpretations  by 
radiologists  reading  the  same  cases.  Using  ROC  analysis,  the  computer  classification 
scheme  yielded  an  Az  value  of  0.94,  similar  to  that  of  an  experienced  mammographer 
(Az=0.91)  and  statistically  significantly  higher  than  the  average  performance  of  the 
radiologists  with  less  mammographic  experience  (Az=0.80).  With  the  database  we  used, 
the  computer  scheme  achieved,  at  100%  sensitivity,  a  positive  predictive  value  of  83%, 
which  was  12%  higher  than  that  of  the  experienced  mammographer  and  21%  higher  than 
that  of  the  average  performance  of  the  less  experienced  mammographers  at  a  p-value  of 
less  than  0.001. 

The  robustness  of  the  computerized  scheme  to  case-variation  was  evaluated  on  an 
independent  database  consisting  of  1 10  new  cases  (50  malignant  and  60  benign). 
Mammograms  from  the  independent  database  were  digitized  twice  using  two  different 
laser  scanners  (Konica  LD  4500  and  Lumiscan  100)  in  order  to  evaluate  the  robustness  of 
the  scheme  to  the  variation  in  digitization  techniques.  Using  ROC  analysis,  the 
classification  scheme  achieved  Az  values  of  0.82  (Konica)  and  0.81  (Lumiscan)  on  the 

independent  database.  Results  from  statistical  analyses  showed  that  the  differences  in  the 
performance  due  to  the  case- variation  between  the  training  and  independent  databases 
and  the  variation  in  film  digitization  techniques  were  not  statistically  significant  (p=0.14, 
0.10  and  0.76).  The  independent  evaluation  of  the  computerized  scheme  for  the 
classification  of  benign  and  malignant  masses  showed  that  the  computerized 
classification  scheme  is  robust  to  the  variations  in  case-difficulty  and  digitization 
techniques. 
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In  our  computerized  classification  method  for  estimating  the  likelihood  of 
malignancy  of  mammographic  masses,  we  investigated  two  different  classifiers  —  an 
artificial  neural  network  (ANN)  and  a  hybrid  system  (one  stop  rule-based  followed  by  an 
artificial  neural  network).  In  order  to  understand  the  difference  between  the  two 
classifiers,  we  investigated  their  learning  and  decision-making  processes  by  studying  the 
relationships  between  the  input  features  and  the  outputs.  A  correlation  study  showed  that 
the  outputs  from  the  ANN-alone  method  correlated  strongly  with  one  of  the  input 
features  (spiculation)  (r  =  0.91),  whereas  the  correlation  coefficients  for  the  other  features 
ranged  from  0.19  to  0.40.  This  strong  correlation  between  the  ANN  output  and 
spiculation  measure  indicates  that  the  learning  and  decision-making  process  of  the  ANN- 
alone  method  were  dominated  by  the  spiculation  measure.  We  found  that  with  a  limited 
database,  it  is  detrimental  for  an  ANN  to  learn  the  significance  of  other  features  in  the 
presence  of  a  dominant  feature.  Our  hybrid  system,  which  initially  applied  a  rule 
concerning  the  value  of  the  spiculation  measure  prior  to  employing  an  ANN,  prevents 
over-learning  from  the  dominant  features  and  performed  better  than  the  ANN-alone 
method  in  merging  the  computer-extracted  features  into  a  correct  diagnosis  regarding  the 
malignancy  of  the  masses. 

The  effectiveness  of  the  computerized  classification  scheme  as  an  aid  to 
radiologists  in  the  task  of  differentiating  between  benign  and  malignant  masses  was 
evaluated  in  a  preliminary  observer  study.  The  preliminary  observer  study  was 
conducted  including  20  selected  cases  and  128  radiologists.  For  each  case,  the  observer 
viewed  the  CC,  MLO  and  special  views  (e.g.,  magnified  or  spot  compression  view)  of  the 
mass  lesion  on  a  monitor,  along  with  a  minified  image  of  all  four  standard  views.  The 
observers  were  asked  to  give  their  confidences  regarding  the  likelihood  of  malignancy  for 
each  case,  first  without  and  then  with  the  computer  output  of  an  estimated  likelihood  of 
malignancy.  As  many  as  6  training  cases  were  shown  to  the  observers  before  the  actual 
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study.  The  20  cases  was  randomized  differently  for  each  observer.  The  average 
performance  of  the  radiologists  in  terms  of  Az  value  was  0.89  and  0.94  without  and  with 
the  computer  aid,  respectively.  Results  from  the  paired  t-test  showed  that  the  difference 
in  Az  was  statistically  significant  (p-value  <  0.0001).  The  preliminary  results  from  an 
observer  study  showed  that  a  significant  improvement  in  the  performance  of  radiologists 
was  achieved  in  the  classification  of  benign  and  malignant  masses  when  computer  aid 
was  used. 

We  evaluated  our  computerized  classification  method,  which  was  initially 
developed  on  digitized  screen/film  mammograms,  on  a  large  database  of  digital 
mammograms.  We  collected  1 10  prospective  cases  (212  images)  from  a  LORAD 
stereotactic  imaging  system  that  had  initially  been  obtained  for  needle  localization  or  core 
biopsy  of  a  suspect  mass  lesion.  The  database  consisted  of  44  malignant  cases  and  66 
benign  cases.  The  computer  classification  method  includes  the  automated  segmentation 
of  the  mass  lesions  from  the  breast  parenchyma,  the  automated  extraction  of  lesion 
features,  and  the  automated  classification  of  the  suspect  lesion  into  an  estimate  of  the 
likelihood  of  malignancy.  A  Bayesian  neural  network  (BANN)  was  used  to  merge  the 
four  features  of  spiculation,  margin  sharpness,  average  gray  level,  and  texture.  The 
BANN  uses  regularization  to  prevent  overtraining  of  the  network.  The  untrained 
computer  method  from  the  screen/film  database  yielded  an  Az  of  0.72  on  the  digital 
mammography  database.  After  retraining  of  the  BANN,  the  Az  increased  to  0.91,  similar 
to  that  obtained  from  the  radiologists'  suspicion  ratings  of  the  lesions  (0.92).  Further 
investigation  of  the  features  showed  that  the  spiculation  feature  performed  better  on  the 
screen/film  database,  whereas  the  texture  feature  performed  better  on  the  digital 
mammography  database.  Due  to  differences  in  the  physical  characteristics  of  the  two 
image  acquisition  systems,  features  values  and  the  merging  of  these  by  classifiers  needs 
to  be  carefully  optimized. 
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To  evaluate  the  effectiveness  of  a  computerized  classification  method  as  an  aid  to 
radiologists  reviewing  clinical  mammograms  for  which  the  diagnoses  were  unknown  to 
both  the  radiologists  and  the  computer.  Six  mammographers  and  6  community 
radiologists  participated  in  an  observer  study.  These  12  radiologists  interpreted,  without 
and  with  the  computer  aid  110  cases  that  were  unknown  to  both  the  12  radiologist 
observers  and  the  trained  computer  classification  scheme.  The  radiologists, 
performances  in  differentiating  between  benign  and  malignant  masses  without  and  with 
the  computer  aid  were  evaluated  using  ROC  analysis.  Two-tailed  p-values  were 
calculated  for  Student’s  t-test  to  indicate  the  statistical  significance  of  the  differences  in 
performances  with  and  without  the  computer  aid. 

When  the  computer  aid  was  used,  the  average  performance  of  the  12  radiologists 
improved,  as  indicated  by  an  increase  in  Az  from  0.93  to  0.96  (p-value=0.0002),  by  an 
increase  in  A  ’zfrom  0.56  to  0.72  ( p-value— 0.0002 ),  and  by  an  increase  in  sensitivity 
from  94%  Cl  =(-0.054,  0.026)).  When  we  analyzed  results  from  the  mammographers  and 
community  radiologists  as  separate  groups,  a  larger  improvement  was  demonstrated  for 
the  community  radiologists.  Computer-aided  diagnosis  can  potentially  help  radiologists 
improve  their  diagnostic  accuracy  in  the  task  of  differentiating  between  benign  and 
malignant  masses  seen  on  mammograms. 

(5)  Development  of  prototype  CAD  workstation 

Our  intelligent  workstation  consists  of  an  IBM  RISC  6000  Powerstation  590,  a 
Konica  LD4500  film  digitizer,  an  Alphatronix  Inspire  magneto-optical  jukebox,  2 
Imlogix  1000  CRT  monitors  and  a  Seikosha  VP4500  thermal  printer.  The  system  has 
been  used  in  the  clinical  reading  area  of  the  Department  of  Radiology  since  November  8, 
1994. 

Each  day  all  screening  mammograms  (4-views  per  case)  were  digitized.  As  the 
films  are  being  digitized,  using  a  100  micron  pixel  size  and  1024  grey  levels,  the 
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microcalcification  detection  program  is  run  on-line  in  parallel.  The  mass  detection 
program  is  run  off-line  overnight,  since  the  films  are  not  reviewed  until  the  next  day. 
After  all  four  films  have  been  analyzed,  the  results  of  the  microcalcification  detection 
program  are  displayed  in  a  single  1024x1280  image  as  a  collage  of  four  512x620  images 
with  arrow(s)  displayed  on  the  image  as  annotation  indicating  the  computer  results.  The 
results  were  then  recorded  on  thermal  paper,  upon  which  the  radiologists  can  make  notes 
and  comments.  The  results  of  the  mass  detection  program  were  printed  using  the  same 
format  the  next  morning.  A  full  case,  four  films,  can  be  processed  in  less  than  5  minutes. 

Recently,  we  have  made  a  major  modification  in  the  existing  workstation  by 
incorporating  a  touch-screen  CRT  monitor  to  display  the  results  of  the  computer  analyses 
to  the  radiologist.  This  will  replace  the  thermal  paper  copy  and  will  facilitate  recording 
of  radiologists’  findings.  The  touch-screen  system  is  used  for  recording  the  location  of 
lesions  that  the  radiologist  believe  are  malignant.  A  digital  copy  of  the  four  views  will  be 
displayed  on  a  monitor  with  no  computer  results.  The  radiologist,  after  reading  the 
original  film  mammograms,  will  touch  the  screen  of  the  CRT  monitor  to  indicate 
region(s)  in  the  images  that  may  contain  cancer.  If  the  radiologist  considers  no  cancer 
lesion  to  be  present  in  the  image,  he/she  will  also  enter  this  initial  normal  finding  to  the 
workstation  using  the  touch  screen,  using  a  button  displayed  on  the  CRT  monitor  outside 
the  breast  region.  Once  this  is  done,  the  computer  results  will  be  displayed  on  the  CRT 
monitor  and  the  radiologist,  after  reviewing  the  computer  results  together  with  the 
original  films,  will  again  use  the  touch  screen  to  indicate  suspicious  region(s)  in  the 
images  on  the  monitor,  if  the  location  of  the  malignant  region  (s)  found  with  the 
computer  output  is  different  from  the  initial  location,  or  if  the  initial  finding  is  normal. 

(6)  Clinical  evaluation  of  CAD  workstations 

As  of  December  2000,  over  25,000  cases  have  been  analyzed  by  using  our  CAD 
workstation  at  the  University  of  Chicago.  We  are  analyzing  the  sensitivity  and  false- 
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positive  rate  of  the  workstation  for  the  first  three  years  (12,670  cases).  With  follow-up  of 
up  to  five  years  for  some  patients,  79  women  have  developed  breast  cancer  in  our  study 
cohort.  Of  the  79  cancers,  61  were  initially  detected  on  a  screening  mammogram.  The 
remaining  cases  were  initially  detected  either  on  a  diagnostic  mammogram  or  by  physical 
examination.  Of  these,  14  had  true  negative  screening  mammograms  even  in 
retrospective  review,  and  4  were  read  as  negative,  the  cancer  was  visible  in  retrospect. 

Of  the  65  mammographically- visible  cancers,  the  computer  identified  the  cancer  in  44 
cases  (31/46  for  masses  and  13/19  for  calcifications).  For  the  79  cancer  patients,  42  had  a 
negative  screening  mammogram  that  was  included  in  our  study  cohort.  Retrospective 
review  of  these  cases  showed  that  19  were  mammographically  occult.  In  the  23  cases 
that  had  a  subtle  lesion  visible  in  retrospect,  the  computer  identified  12  of  them.  Thus, 
the  computer  was  able  to  detect  52%  of  “missed”  cancers  approximately  one  year  prior  to 
diagnosis.  The  cases  containing  a  missed  cancer  are  being  used  in  an  observer  study  to 
see  if  radiologists  can  detect  more  cancers  when  they  use  the  computer  aid. 

The  false  positive  rate  of  the  computer  schemes  increased  is  currently  2.15  false 
masses  per  image  and  from  1 .0  false  clusters  per  image.  The  clustered  microcalcification 
false  positive  rate  decreased  from  approximately  1.7  to  1.0  when  the  screening  clinic 
moved  to  a  new  location  within  the  hospital.  It  appears  that  the  new  darkroom  is  cleaner 
than  the  old  one  and  therefore  there  are  now  less  film  artifacts  in  the  images. 

The  R2  Technology  Ml 000  Image  Checker  was  installed  at  Grant  Square 
Imaging  in  early  April  1998  to  support  the  Demonstration  Project.  Since  that  time,  all 
mammographic  interpretations  performed  there  have  been  done  with  computer  assistance. 
Installation  of  a  new  radiology  information  system  at  the  site  at  approximately  the  same 
time  has  facilitated  data  collection.  In  addition  to  basic  mammography  audit  data,  the 
radiologists  now  also  record  all  cases  in  which  computer  assistance  altered  patient  care, 
most  typically  resulting  in  a  call  back  for  computer  detected  finding. 
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The  baseline  data  for  interpretation  of  mammograms  without  computer  assistance 
at  Grant  Square  extends  from  1-1-97  until  3-31-98.  All  mammographic  interpretations 
from  1-1-97  to  12-31-97  corresponding  to  BIRADS  categories  4  and  5  have  been  tracked 
to  this  point.  Results  for  this  year  will  be  finalized  after  physicians  offices  have  been 
contacted  a  third  time  about  several  cases. 

In  a  positive  development  that  should  add  greatly  to  the  number  of  examinations 
included  in  the  study,  LaGrange  Memorial  Hospital  has  decided  to  purchase  an  Ml 000. 
The  baseline  period  for  interpretation  of  mammograms  without  computer  assistance  at 
LMH  will  be  1-1-97  to  approximately  12-31-98.  The  audit  for  radiologists'  performance 
for  1-1-97  to  12-31-97  is  essentially  complete.  Highlights  include:  volume  of 
approximately  7500  cases;  a  4.1  per  thousand  cancer  detection  rate;  a  73%  minimal 
cancer  detection  rate  (Tis,  Tla  and  Tib  lesions);  a  7%  call  back  rate.  The  protocol  for 
procedure  interpretation  with  computer  assistance  will  be  the  same  at  Grant  Square  and 
LMH. 

The  R2  Technology  Image  Checker  was  installed  at  Grant  Square  Imaging  in 
early  April  1998  to  support  the  demonstration  project.  Since  8-1-98,  radiologists  at  Grant 
Square  have  recorded  all  cases  in  which  information  from  the  Image  Checker  has  resulted 
in  additional  patient  evaluation  (i.e.,  or  findings  not  initially  appreciated  by  the 
radiologist).  A  continuing  audit  of  mammogram  interpretations  with  the  R2  system  at 
Grant  Square  is  ongoing.  This  audit  is  now  largely  complete  through  8-30-01  ( through 
5-30-00  as  of  last  report).  A  total  of  approximately  9700  mammograms  have  been  read 
between  8-1-98  and  8-30-01.  In  91  of  these,  information  from  the  Image  Checker 
changed  the  interpretation.  A  total  of  12  biopsy  recommendations  were  generated  on  the 
basis  of  these  91  cases.  Eight  of  these  biopsies  have  been  performed  according  to  our 
records.  Three  cancers  were  found  as  a  result  of  the  biopsy  recommendations  (3/12  or 
25%).  Based  on  a  continued  overall  cancer  detection  rate  of 3/1000  at  Grant  Square, 
this  corresponds  to  an  approximately  10%  increase  in  yield  in  early  cancer  detection 
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related  to  the  R2  system.  Freer  et  al.  have  recently  reported  a  20%  increased  yield  with 
CAD  using  the  same  basic  experimental  design.  The  significance  of  these  differing 
results  has  not  yet  been  established.  The  average  number  of  mammogram  interpretations 
at  Grant  Square  is  approximately  10/day  -  a  much  lower  volume  than  in  the  Freer 
practice.  It  is  certainly  possible  that  the  utility  of  computer-aided  detection  is  dependent 
on  case  value,  however. 

We  have  not  detected  a  noticeable  change  in  the  utility  of  the  Image  Checker  with 
time,  as  assessed  by  the  proportion  of  interpretations  that  are  changed.  Between  8-1-98 
and  5-1-00,  approximately  0.95%  of  interpretations  were  changed  due  to  CAD 
information.  Between  5-1-00  and  8-30-01,  approximately  0.92%  of  interpretations  were 
changed. 

We  continue  to  attempt  to  obtain  objective  information  regarding  absolute  cancer 
detection  rates  before  and  after  the  acquisition  of  the  R2  Image  Checker.  Efforts  at 
Resurrection  Hospital  have  been  reported  previously  and  are  ongoing.  Given  the 
relatively  small  number  of  cancers  in  the  Grant  Square  data,  an  attempt  is  being  made  to 
extend  the  “ baseline ”  cancer  detection  rate  ( before  R2)  through  auditing  of  data  from 
1995  and  1996. 

2.3  Recommendations  in  relation  to  the  Statement  of  Work 

Our  progress  follows  closely  the  proposed  statement  of  work.  Therefore,  we  do 
not  recommend  a  change  in  the  proposed  statement  of  work. 

3.  CONCLUSIONS 

We  have  made  significant  progress  in  the  development  of  various  CAD  schemes 
for  detection  and  characterization  of  breast  lesions.  Evaluation  of  our  CAD  workstation 
and  collection  of  mammographic  audit  data  have  begun  and  continued.  Therefore,  it  is 
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expected  that  this  project  will  produce  a  useful  result  concerning  the  impact  of  CAD 

schemes  in  the  additional  detection  of  breast  cancer. 
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